Topic : An Analysis of Food Security in USA - 2021


Introduction

Although food is the basic human right, many people in the world are living in highly food insecure condition. According to World Bank, with the pandemic hit, domestic food price inflation increased at least by 5 percent globally. Also, according to the United Nations news, 828 million people were affected by hunger in 2021. It’s important to analyse what factors are affecting food security and take measures to reduce world’s hunger. This the one of the reasons that we were interested to study food security in USA. Although, there were limitations to our study. Food security is very complex issue, and needs multi-angle, thorough research. We’ve chosen only certain socio-economic variables out of 507 variables.


Our team’s research topic is the situation of food security in 2021. We want to know how different demographic and socioeconomic factors relate to food security.

We are using the Current Population Survey - Food Security Supplement Dec 2021 data provided by the US Census Bureau

The Dataset contains 507 variables and roughly 120,000 observations


The Smart Question we have proposed and hope to answer are

Specific:- To study the specific pattern shown in the data that affects food security such as states, counties, income level, whether the family uses SNAP, race, immigrant status, work status, education level and many more demographic, socio-economic variables.

Measurable: Use EDA techniques to know how significantly different factors contribute to food insecurity.

Achievable: Can find variables which are significantly affecting food insecurity and can create models for ensuring food security in households.

Relevant: Food being the basic requirement of any human, this study can shed light on what the authorities and we ourselves can do in order to eradicate food insecurity.

Time-oriented: Data set for the month of December 2021 is considered for the study so that it can also show the effect of Covid-19 in food security.


Considering the Questions we are asking, we have decided to select just 11 factors to work on

A very significant limitation to our data is that we have trimmed off a lot of observations where either the interview was not taken or not completed. Ideally we should account for these observations somehow, but due to time constraints we aren’t doing that


## 'data.frame':    71472 obs. of  12 variables:
##  $ Id                : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
##  $ States            : Factor w/ 51 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Family_Size       : Factor w/ 14 levels "1","2","3","4",..: 1 2 2 2 2 2 2 2 2 1 ...
##  $ Household_Income  : Factor w/ 16 levels "1","2","3","4",..: 16 14 14 12 12 13 13 9 9 11 ...
##  $ SNAP              : Factor w/ 5 levels "-3","-2","-1",..: 3 3 3 5 5 3 3 5 5 3 ...
##  $ Ethnicity         : Factor w/ 24 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 1 ...
##  $ Citizenship_status: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Number_of_Jobs    : Factor w/ 4 levels "-1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Hours_on_Jobs     : Factor w/ 88 levels "-4","-1","0",..: 67 43 2 43 43 62 43 2 2 2 ...
##  $ Education_Level   : Factor w/ 17 levels "-1","31","32",..: 14 15 1 14 14 10 10 7 10 5 ...
##  $ FoodSecurity_score: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 2 2 1 ...
##  $ PRNMCHLD          : Factor w/ 12 levels "0","1","2","3",..: 1 2 1 1 1 1 1 1 1 1 ...

Coming to our Response Variable, Food Security


High Food Security: No reported indications of food-access problems or limitations.

Marginal Food Security: One or two reported signs, usually anxiety over food availability or scarcity in the home. There is little to no evidence that diets or food intake have changed.

Low Food Security: One or two reported signs, usually indicating worry about food scarcity or insufficiency at home. Little to no evidence of dietary or food intake changes.

Very Low Food Security: Reports of numerous signs of altered eating habits and decreased food intake.


captioncaption

caption

Ethnicity


  • We have 25 Ethnicities in this Data. We will explore the relationship between Ethnicity and Food Security graphically, and do some statistical testing to confrim the said relationship

Plotting barcharts between all types of Ethinicity and food security status
captioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaption

caption

Statistical Testing


  • We are going to be Using Fisher’s Exact Test instead of Chi-square test because of the numerous levels with low frequency of observations

  • Our Null Hypothesis is that Ethnicity and Food Security Status are Independent of each other.

  • Taking our alpha to be 5%


## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  FS_Subset$Ethnicity and FS_Subset$FoodSecurity_score
## p-value = 0.0004998
## alternative hypothesis: two.sided

  • Since, the P-Value is less than our taken alpha we can say that there is a statistically significant relationship between Ethnicity and Food Security

Citizenship


  • We have 5 levels of Citizenship status in this Data. We will explore the relationship between Citizenship and Food Security graphically, and do some statistical testing to confrim the said relationship

Plotting barcharts between different Citizenship status’s and food security status
captioncaptioncaptioncaptioncaption

caption

Statistical Testing


  • We are Chi-square test

  • Our Null Hypothesis is that Citizenship Status and Food Security Status are Independent of each other.

  • Taking our alpha to be 5%


## 
##  Pearson's Chi-squared test
## 
## data:  FS_Subset$Citizenship_status and FS_Subset$FoodSecurity_score
## X-squared = 437.62, df = 12, p-value < 2.2e-16

  • Since, the P-Value is less than our taken alpha we can say that there is a statistically significant relationship between Citizenship Status and Food Security

SNAP


  • SNAP Stands for Supplemental Nutrition Assistance Program
  • SNAP factor had 5 levels to in this dataset
  • We have dropped a level which indicated that the observations are not in universe

Plotting Stacked barcharts between SNAP status and food security status
captioncaption

caption

Statistical Testing


  • We are Chi-square test

  • Our Null Hypothesis is that SNAP Status and Food Security Status are Independent of each other.

  • Taking our alpha to be 5%


## 
##  Pearson's Chi-squared test
## 
## data:  chi_test_SNAP
## X-squared = 764.1, df = 3, p-value < 2.2e-16

  • Since, the P-Value is less than our taken alpha we can say that there is a statistically significant relationship between SNAP Status and Food Security

Odds Ratio

##              Outcome +    Outcome -      Total        Inc risk *        Odds
## Exposed +         4471         2737       7208              62.0        1.63
## Exposed -        14258         4008      18266              78.1        3.56
## Total            18729         6745      25474              73.5        2.78
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio                                 0.79 (0.78, 0.81)
## Odds ratio                                     0.46 (0.43, 0.49)
## Attrib risk in the exposed *                   -16.03 (-17.30, -14.76)
## Attrib fraction in the exposed (%)            -25.84 (-28.34, -23.40)
## Attrib risk in the population *                -4.54 (-5.34, -3.73)
## Attrib fraction in the population (%)         -6.17 (-6.68, -5.66)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 682.162 Pr>chi2 = <0.001
## Fisher exact test that OR = 1: Pr>chi2 = <0.001
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units

  • The Odds ratio is 0.46, with a 95% Confidence Interval
  • This means that the odds of a person not on SNAP to be food secure is 2.17 times the odds of person on SNAP to be food secure.

Now to conduct a study on the basis of Number of Jobs they do, Education Level and Hours on work per week we create another dataset and move forward with the analysis.

Now we rename the responses of the variable as per mentioned in the technical file of this data.

The summary of the education level of respondents is as follows:

The summary of the number of jobs of the respondents is as follows:

Number of Jobs

Now we conduct an in-depth EDA on the Number of Jobs of the respondents
captioncaption

caption

The above graph shows a result where majority of the respondents doesn’t fall under the category of eligible to answer this question. And hence their response is marked as Not Applicable.

This hides the proper analysis of the responses. Hence we remove the Not Applicable responses from the variable and move forward with the study.

caption

caption

captioncaptioncaption

caption

The above graphs clearly gives us an understanding about how many people in different categories of the variable have different food security scores.

Though it is evident that the majority of the responses irrespective of the number of jobs are saying they are Highly Food Secure, people who have 4 or more jobs have a very evident number of food insecure people. But still whether they have any dependency or not needs to be analyzed using proper statistical tests. For this we bring in the Chi- square test.

## 
##  Pearson's Chi-squared test
## 
## data:  contable_number_of_jobs
## X-squared = 8.4874, df = 6, p-value = 0.2045

The result gave warnings as the value for some cells in contigency table are very low. From the test, we see that the P-value for the Chi-square test is 0.3871 which is greater than the default value 0.05. Hence we accept the null hypothesis and hence, Number of Jobs doesn’t significantly affect the Food Security.

The test showed that, though there is variance in the proportion in graph, there is no dependency of these two vaiables.

###Education Level

The level of education is very important. We always assume that education is a powerful tool to eradicate poverty and hunger. With the assumption that this is true in case for food security, we shall dive into the EDA analysis of the variable. The variable had many responses.

The initial setup was done on the data by removing the “Not Applicable” responses from the data as per the instruction from the technical data.

Since the responses were in many categories, it is good to look into their frequency table.

The frequency table gave us an outline about the sample we are using. Most of the people in the survey are educated to High School or above. This is very promising about the overall growth of the society and a hope for better future.

Now even though the frequency table showed us with a majority educated sample, the food security score of the people still need to be studied. The initial tool that we can use for this study is graphical representation.

captioncaptioncaption

caption

These three graphs gives us a perfect picture of the sample in this variable. Initial graph shows the fact that major proportion of the population belongs to a category of High school education or more. The next two graphs shows us the division of the category in terms of food security score. Though major ratio goes for high food security, we can clearly see a picture of decreasing low food security scored people as the education level increase.

But this hypothesis needs to be supported by appropriate statistical tests. Since the data variables under study are both catgorical, we use Chi-Square test to test the independence of the variables.

The Chi-Square test results are as follows:

## 
##  Pearson's Chi-squared test
## 
## data:  contable_edu
## X-squared = 3136.8, df = 48, p-value < 2.2e-16

The test results showed that p-value is less than 0.05, the default alpha value and hence we can say that Education_Level is having a significant effect on the Food Security of People. This is such an important factor to keep in mind as giving people more education can actually act as a solution gives us with a better hope. Majority of the population being educated or will be educated makes this variable as important as anything.

##Hours_on_Jobs

The Hours on Job for a week vary from people to people. How they food security score had an effect on hours on job is studied in the following EDA analysis.

There were few people who gave response as time varies and they don’t have a perfect time period, gave responses in the study. But inorder to meet the hypothesis of the test and easy analysis, those responses were removed. Removing those responses made the variable continuous and the acceptable range of responses were 0 to 99.

## [1] 2

The above results shows that the range of responses were between 1 and 88. Though the average hours of work is around 20 Hours, it is surprising to learn that majority of the responses said they work for 2 hours. This needs to be further verfied and studied as this trend can’t be encouraged. But also the possible reason why the mode came to be 2 Hours per week can also be because, the respondents were students. Further study needs to be extended on this.

captioncaptioncaptioncaption

caption

The above graphs clearly depicts the distribution of data on this variable.

The following is a bargraph for the same variable.
caption

caption

From the bargraph, we can see that, though the boxes lie around the same region, few of the classes have very evident difference. This needs to be checked. Whether food security score affects the hours on job can be studied using ANOVA test and our hypothesis is verified by this statistical test. Further on the Post-Hoc test verifies which groups have differences.

##                       Df   Sum Sq Mean Sq F value Pr(>F)    
## FoodSecurity_score     3   279623   93208   214.6 <2e-16 ***
## Residuals          71468 31045414     434                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Hours_on_Jobs ~ FoodSecurity_score, data = food_hoj)
## 
## $FoodSecurity_score
##                                                     diff       lwr        upr
## Marginal Food Security-High Food Security     -4.2140290 -4.972646 -3.4554123
## Low Food Security-High Food Security          -5.6863555 -6.488530 -4.8841810
## Very Low Food Security-High Food Security     -6.0702228 -7.206149 -4.9342962
## Low Food Security-Marginal Food Security      -1.4723264 -2.531399 -0.4132541
## Very Low Food Security-Marginal Food Security -1.8561937 -3.186036 -0.5263518
## Very Low Food Security-Low Food Security      -0.3838673 -1.739029  0.9712947
##                                                   p adj
## Marginal Food Security-High Food Security     0.0000000
## Low Food Security-High Food Security          0.0000000
## Very Low Food Security-High Food Security     0.0000000
## Low Food Security-Marginal Food Security      0.0020125
## Very Low Food Security-Marginal Food Security 0.0019070
## Very Low Food Security-Low Food Security      0.8860333

From the results, it is very clear that the greater F value and smaller P value gives us enough confidence to say that the Hours of work changes significantly with the Food Security Score. Now using Post-Hoc test, it is confirmed that, Marginal Food Security-High Food Security, Low Food Security-High Food Security, Very Low Food Security-High Food Security is significantly different from the other. Also, Low Food Security-Marginal Food Security, Very Low Food Security-Marginal Food Security have significant difference but not as strong as the prior ones. These test results help us to say that Hours on Job per week is affected by the Food insecurity that they face.

8657d5728a00f036f19d3ba04f8e0d67a4b3431f

-States, Family size, and Household Inhideome

We wanted to check whether State variable has impact on Food security or not. However, because each State had different number of respondents, it was difficult to make analysis based on the State variable only. For example, California has the highest number of respondents (6975), whereas Maine has the smallest number of respondents (564). More than 10 times difference between these 2 States. In order to compare, we’ve chosen states which has similar number of respondents. Alabama 1207 versus Washington DC 1207, Florida 2738 versus New York 2580, IL 2052 versus PA 1928. Looking at th graphs, Alabama has more food insecurity than DC, Florida and New York has similar food security level as well as Illinois and Pennsylvania.

Plotting barcharts between all the levels of state and food security status

captioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaptioncaption

caption

Reference to household income: 1 LESS THAN $5,000 2 5,000 TO 7,499 3 7,500 TO 9,999 4 10,000 TO 12,499 5 12,500 TO 14,999
6 15,000 TO 19,999 7 20,000 TO 24,999 8 25,000 TO 29,999 9 30,000 TO 34,999 10 35,000 TO 39,999
11 40,000 TO 49,999 12 50,000 TO 59,999 13 60,000 TO 74,999 14 75,000 TO 99,999 15 100,000 TO 149,999 16 150,000 OR MORE

## 
##  Pearson's Chi-squared test
## 
## data:  income_t
## X-squared = 9512.9, df = 45, p-value < 2.2e-16

We can say that Household income is affecting food insecurity. When Household income is less than 20000 dollars, it’s more likely to have high food insecurity and, if the Household income is between 20000 and 40000,the families have low food security.

## Warning in chisq.test(family_t): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  family_t
## X-squared = 1691.7, df = 39, p-value < 2.2e-16

We can say that Family size is affecting food insecurity. When family size gets bigger, it’s more likely to have very low food security.

captioncaptioncaption

caption

As you can see from the boxplot, whenever family size bigger (more than 6 people), food insecurity is high. Also, household income has direct effect on food security. When household income is higher than 40k, food security score is low.

caption

caption

caption

caption

Interesting thing from this graph is that when family size ig bigger, household income is high and that family has high food security. When Family size and Household income are separate, they have significant relationship with food security. However, when they are combined together, the result is different. For further analysis, we need to consider age and employment type of the family members.

Conclusion

Ethnicity, Citizenship, participation in SNAP, Education level, Hours on work, Household income, Family size have significant relation with the Food Security Score. Out of all the chosen variables, these 7 variables showed a relation to food security. But keeping an open mind to possible errors in survey data, it is better to rule out other variables only aftter further studies.

Future Scope

The results from the study also shows that further analysis can come up with models to predict the food security score and hence it reveals the important sectors that officials should focus on in order to eradicate food insecurity. The initial results shows that 10% of the population are still food insecure in India, which accounts for about 33 million people. Food being the vary basic need of any living being, all should have access to safe and proper food. This signifies the result of the study and the further need of studies.